Add PostgreSQL observability telemetry exposure#1808
Add PostgreSQL observability telemetry exposure#1808mploski merged 12 commits intofeature/database-controllersfrom
Conversation
a1b796f to
976ecd1
Compare
|
CLA Assistant Lite bot: I have read the CLA Document and I hereby sign the CLA 1 out of 3 committers have signed the CLA. |
d710f58 to
63b5937
Compare
988138d to
08dfa16
Compare
| oldConditions := make([]metav1.Condition, len(postgresCluster.Status.Conditions)) | ||
| copy(oldConditions, postgresCluster.Status.Conditions) | ||
|
|
||
| if err := reconcilePostgreSQLMetricsService(ctx, c, rc.Scheme, postgresCluster, postgresMetricsEnabled); err != nil { |
There was a problem hiding this comment.
that could potentially be packed into unit of work style of code block, as It's very repeatable across the logic statements, It could get packed into and executable interface which handles the componentMetrics or something in this line of thought.
The code would then get separated into testable blocks and orchestrated cleanly.
| return ctrl.Result{}, errors.Join(err, statusErr) | ||
| } | ||
|
|
||
| postgresMetricsEnabled := isPostgreSQLMetricsEnabled(postgresCluster, clusterClass) |
There was a problem hiding this comment.
Just an idea, could that be merged into a general port like "Are the component displaying metrics == enabled"? Because there is always a possibility to expand then without adding isComponent1MetricsEnabled, isComponentNMetricsEnabled. Just the method, getComponentMetricsSettings(...) return map[string(component)]bool. One config poll for any future coming.
There was a problem hiding this comment.
lets do this as a part of refactor initiative. Not sure if this should be separated port or not as this is tightly coupled to low level cluster provider though
There was a problem hiding this comment.
as the cluster provider is still orchestrated/managed by k8s I would say It's "portable". Yeah, let's analyse It along the way later.
| } | ||
|
|
||
| func normalizeCNPGClusterSpec(spec cnpgv1.ClusterSpec, customDefinedParameters map[string]string) normalizedCNPGClusterSpec { | ||
| normalized := normalizedCNPGClusterSpec{ |
There was a problem hiding this comment.
we could potentially map It via json contract. Unless we have tags busy in our specs, which we probably have. If not, It would be mapped straight into cnpg spec.
Btw. wha do ingeritedAnnotations mean tech wise?
There was a problem hiding this comment.
inherited annotations set a anottations on the k8s pod. Thanks do this we have a way to discover every pod with those annotations and this is what otel collector use to find pod endpoints to scrape from
| assert.Equal(t, postgresMetricsPortString, cluster.Spec.InheritedMetadata.Annotations[prometheusPortAnnotation]) | ||
| } | ||
|
|
||
| func TestClusterSecretExists(t *testing.T) { |
There was a problem hiding this comment.
nit: Isn't It more readable to split the units into chunks?
a naming idea, just for reference. TestClusterSecret_with_n_expected_rwro_poolers_exist
There was a problem hiding this comment.
We decided to follow table driven tests for tests that use the same methods but have different input/output expectations
There was a problem hiding this comment.
I thought that It might be the case. Backing off of the nit, all fine!
e3f97e8 to
6638532
Compare
6638532 to
3b83475
Compare
Description
Adds PostgreSQL observability telemetry for
PostgresClusterusing Prometheus pod-annotation-based scraping. Metrics are exposed by CNPG's built-in exporters on PostgreSQL pods (port9187) and PgBouncer pooler pods (port9127). The operator controls whether annotations are injected via class- and cluster-level configuration, with no dedicated metricsServiceorServiceMonitorrequired for PostgreSQL or PgBouncer scraping.A
ServiceMonitoris still supported for operator-controller metrics as an optional step.Key Changes
api/v4/postgresclusterclass_types.goAdded class-level observability configuration (
monitoring.postgresqlMetrics.enabled,monitoring.connectionPoolerMetrics.enabled) that controls whether scrape annotations are injected into CNPG pods.api/v4/postgrescluster_types.goAdded cluster-level disable-only overrides (
spec.monitoring.postgresqlMetrics.disabled,spec.monitoring.connectionPoolerMetrics.disabled) allowing per-cluster opt-out without changing the class.pkg/postgresql/cluster/core/cluster.goWired observability flag resolution into
PostgresClusterreconciliation. When enabled, setsInheritedMetadata.Annotationson the CNPGCluster(for PostgreSQL pods) andTemplate.ObjectMeta.Annotationson CNPGPoolerresources (for PgBouncer pods).pkg/postgresql/cluster/core/monitoring.goAdded
isPostgreSQLMetricsEnabled/isConnectionPoolerMetricsEnabledflag resolution helpers.Added
buildPostgresScrapeAnnotations/buildPoolerScrapeAnnotationsannotation builders.Added
removeScrapeAnnotationsfor the disable path.pkg/postgresql/cluster/core/monitoring_unit_test.goAdded unit tests for flag resolution, scrape annotation builders, and annotation removal.
internal/controller/postgrescluster_controller_test.goAdded integration tests verifying that
InheritedMetadataannotations are set on the CNPGClusterwhen monitoring is enabled and removed when disabled by cluster override.docs/PostgreSQLObservabilityDashboard.jsonReference Grafana dashboard covering PostgreSQL target count, RW/RO PgBouncer availability, WAL activity, database sizes, PgBouncer client load, controller reconcile metrics, and domain fleet metrics.
docs/postgresSQLMonitoring-e2e.mdEnd-to-end validation guide for the annotation-based scraping flow on KIND.
Testing and Verification
Added unit tests in
pkg/postgresql/cluster/core/monitoring_unit_test.gofor:9187) and PgBouncer (port9127)Added integration tests in
internal/controller/postgrescluster_controller_test.goverifying:InheritedMetadata.Annotationspresence when monitoring is enabledRelated Issues
CPI-1853 — related JIRA ticket.
Grafana screenshot:
PR Checklist